background-image: url(data:image/png;base64,#svg/coding.svg) background-size: 500px background-position: 50% 50% class: center, middle, inverse # Assignment 002 --- # Examine an OTU Table in R .pad-left[ - Go to the Human Microbiome Project (HMP): [https://www.hmpdacc.org/HMQCP/](https://www.hmpdacc.org/HMQCP/) - Examine the datasets available - Clone the second class repository: [https://github.com/bjklab/EPID674_002_sequences-to-counts.git](https://github.com/bjklab/EPID674_002_sequences-to-counts.git) - Install & load packages necessary to read the HMP OTU table (included in second class repository) - View the data & complete a brief assignment ] --- background-image: url(data:image/png;base64,#img/hmpdacc.png) background-size: contain --- background-image: url(data:image/png;base64,#img/hmpdacc_otus.png) background-size: contain --- background-image: url(data:image/png;base64,#img/rstudiocloud_repo.png) background-size: contain --- # Using the [rstudio.cloud](https://rstudio.cloud) console .pull-left[ ```r # install necessary functions # (first time only) # (this will take a while) install.packages('tidyverse') # load tidyverse functions library(tidyverse) ``` ] .pull-right[ - Code at left installs and loads the "tidyverse" package - The "tidyverse" package includes a set of other packages that permit streamlined data processing - See Hadley Wickham's _R For Data Science_: [https://r4ds.had.co.nz/](https://r4ds.had.co.nz/) ] --- exclude: TRUE .pull-left[ ```r # make sure tidyverse loaded library(tidyverse) # load (trimmed) HMP V1-V3 OTU table # (downloaded with Class 2 repository) otu <- read_tsv( file = "./data/otu_table_psn_v13_TRIMMED.txt.gz", ) # note: full HMP table crashes rstudio.cloud # but if you're using your own computer... # # otu <- read_tsv( # file = "./data/otu_table_psn_v13.txt.gz", # skip = 1 # skips an empty first row # ) # View(otu) # can try "View" in RStudio otu # show what you've read ``` ] .pull-right[ ``` ## # A tibble: 43,140 x 22 ## `#OTU ID` `700110831` `700021898` `700113546` `700016133` `700113545` ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 OTU_97.1 0 0 0 0 0 ## 2 OTU_97.10 0 0 0 0 0 ## 3 OTU_97.100 0 0 0 0 0 ## 4 OTU_97.1000 0 0 0 0 0 ## 5 OTU_97.10000 0 0 0 0 0 ## 6 OTU_97.10001 0 0 0 0 0 ## 7 OTU_97.10002 0 0 0 0 0 ## 8 OTU_97.10003 0 0 0 0 0 ## 9 OTU_97.10004 0 0 0 0 0 ## 10 OTU_97.10005 0 0 0 0 0 ## # … with 43,130 more rows, and 16 more variables: 700015293 <dbl>, ## # 700103483 <dbl>, 700097251 <dbl>, 700024697 <dbl>, 700111520 <dbl>, ## # 700038545 <dbl>, 700037298 <dbl>, 700023638 <dbl>, 700016135 <dbl>, ## # 700038435 <dbl>, 700111591 <dbl>, 700109465 <dbl>, 700106547 <dbl>, ## # 700100244 <dbl>, 700114014 <dbl>, Consensus Lineage <chr> ``` ] --- ```r # load (trimmed) HMP V1-V3 OTU table # (downloaded with Class 2 repository) otu <- read_tsv( file = "./data/otu_table_psn_v13_TRIMMED.txt.gz", ) # View(otu) # try the "View" function in RStudio otu # show what you've read ``` ``` ## # A tibble: 43,140 x 22 ## `#OTU ID` `700110831` `700021898` `700113546` `700016133` `700113545` ## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 OTU_97.1 0 0 0 0 0 ## 2 OTU_97.10 0 0 0 0 0 ## 3 OTU_97.100 0 0 0 0 0 ## 4 OTU_97.1000 0 0 0 0 0 ## 5 OTU_97.10000 0 0 0 0 0 ## 6 OTU_97.10001 0 0 0 0 0 ## 7 OTU_97.10002 0 0 0 0 0 ## 8 OTU_97.10003 0 0 0 0 0 ## 9 OTU_97.10004 0 0 0 0 0 ## 10 OTU_97.10005 0 0 0 0 0 ## # … with 43,130 more rows, and 16 more variables: 700015293 <dbl>, ## # 700103483 <dbl>, 700097251 <dbl>, 700024697 <dbl>, 700111520 <dbl>, ## # 700038545 <dbl>, 700037298 <dbl>, 700023638 <dbl>, 700016135 <dbl>, ## # 700038435 <dbl>, 700111591 <dbl>, 700109465 <dbl>, 700106547 <dbl>, ## # 700100244 <dbl>, 700114014 <dbl>, Consensus Lineage <chr> ``` --- ```r # note: full HMP table crashes rstudio.cloud # but if you're using your own computer... otu <- read_tsv( file = "./data/otu_table_psn_v13.txt.gz", skip = 1 # skips an empty first row ) # View(otu) # try the "View" function in RStudio otu # show what you've read ``` --- exclude: TRUE .pull-left[ ```r # make sure tidyverse loaded library(tidyverse) # load HMP sepcimen data # (downloaded with Class 2 repository) specimens <- read_tsv( file = "./data/v13_map_uniquebyPSN.txt.bz2", ) specimens # show what you've read # View(specimens) # also try the "View" function in RStudio ``` ] .pull-right[ ``` ## # A tibble: 2,970 x 11 ## `#SampleID` RSID visitno sex RUNCENTER HMPbodysubsite Description X8 ## <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <lgl> ## 1 700013549 1.58e8 1 fema… BCM Stool HMP_Human_m… NA ## 2 700014386 1.58e8 1 male BCM,BI Stool HMP_Human_m… NA ## 3 700014403 1.58e8 1 male BCM,BI Saliva HMP_Human_m… NA ## 4 700014409 1.58e8 1 male BCM,BI Tongue_dorsum HMP_Human_m… NA ## 5 700014412 1.58e8 1 male BCM,BI Hard_palate HMP_Human_m… NA ## 6 700014415 1.58e8 1 male BCM,BI Buccal_mucosa HMP_Human_m… NA ## 7 700014418 1.58e8 1 male BCM,BI Attached_Kera… HMP_Human_m… NA ## 8 700014421 1.58e8 1 male BCM,BI Palatine_Tons… HMP_Human_m… NA ## 9 700014424 1.58e8 1 male BCM,BI Throat HMP_Human_m… NA ## 10 700014427 1.58e8 1 male BCM,BI Supragingival… HMP_Human_m… NA ## # … with 2,960 more rows, and 3 more variables: X9 <lgl>, X10 <lgl>, X11 <dbl> ``` ] --- ```r # load HMP sepcimen data # (downloaded with Class 2 repository) specimens <- read_tsv( file = "./data/v13_map_uniquebyPSN.txt.bz2", ) specimens # show what you've read ``` ``` ## # A tibble: 2,970 x 11 ## `#SampleID` RSID visitno sex RUNCENTER HMPbodysubsite Description X8 ## <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr> <lgl> ## 1 700013549 1.58e8 1 fema… BCM Stool HMP_Human_m… NA ## 2 700014386 1.58e8 1 male BCM,BI Stool HMP_Human_m… NA ## 3 700014403 1.58e8 1 male BCM,BI Saliva HMP_Human_m… NA ## 4 700014409 1.58e8 1 male BCM,BI Tongue_dorsum HMP_Human_m… NA ## 5 700014412 1.58e8 1 male BCM,BI Hard_palate HMP_Human_m… NA ## 6 700014415 1.58e8 1 male BCM,BI Buccal_mucosa HMP_Human_m… NA ## 7 700014418 1.58e8 1 male BCM,BI Attached_Kera… HMP_Human_m… NA ## 8 700014421 1.58e8 1 male BCM,BI Palatine_Tons… HMP_Human_m… NA ## 9 700014424 1.58e8 1 male BCM,BI Throat HMP_Human_m… NA ## 10 700014427 1.58e8 1 male BCM,BI Supragingival… HMP_Human_m… NA ## # … with 2,960 more rows, and 3 more variables: X9 <lgl>, X10 <lgl>, X11 <dbl> ``` ```r # View(specimens) # also try the "View" function in RStudio ``` --- # Questions .pad-left[ - How many OTUs (rows) in the HMP V1-V3 OTU table? - How many specimens (rows) in the HMP specimen map? - Which specimen types are included? (try the "View" function) - Do you see any other OTU tables on the HMP DACC portal? ] --- background-image: url(data:image/png;base64,#svg/coding.svg) background-size: 500px background-position: 50% 50% class: center, middle, inverse # Done! ### Post questions to the discussion board! --- background-image: url(data:image/png;base64,#svg/bacteria.svg) background-size: 100px background-position: 98% 90% class: center, middle # Thank you! #### Slides available: [github.com/bjklab](https://github.com/bjklab/EPID674_002_sequences-to-counts.git) #### [brendank@pennmedicine.upenn.edu](brendank@pennmedicine.upenn.edu)